Introduction¶
In this notebook we will explore the extracted features from the WESAD dataset.
%reload_ext pretty_jupyter
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import metrics
import sklearn.feature_selection as fs
import seaborn as sns
import plotly_express as px
import plotly.offline as pyo
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)
pyo.init_notebook_mode()
General Analysis¶
First, we import the dataset.
data = pd.read_csv('../data/03_primary/WESAD/combined_subjects.csv')
Data Preview
| Unnamed: 0 | net_acc_mean | net_acc_std | net_acc_min | net_acc_max | EDA_phasic_mean | EDA_phasic_std | EDA_phasic_min | EDA_phasic_max | EDA_smna_mean | EDA_smna_std | EDA_smna_min | EDA_smna_max | EDA_tonic_mean | EDA_tonic_std | EDA_tonic_min | EDA_tonic_max | BVP_mean | BVP_std | BVP_min | BVP_max | TEMP_mean | TEMP_std | TEMP_min | TEMP_max | ACC_x_mean | ACC_x_std | ACC_x_min | ACC_x_max | ACC_y_mean | ACC_y_std | ACC_y_min | ACC_y_max | ACC_z_mean | ACC_z_std | ACC_z_min | ACC_z_max | 0_mean | 0_std | 0_min | 0_max | BVP_peak_freq | TEMP_slope | subject | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1.331891 | 0.153556 | 1.014138 | 1.678399 | 2.247876 | 1.112076 | 0.367977 | 4.459367 | 1.592308 | 2.645333 | 3.096905e-08 | 17.418821 | 0.608263 | 1.212010 | -1.213173 | 2.554750 | -0.043934 | 112.391233 | -392.28 | 554.77 | 35.816000 | 0.017436 | 35.77 | 35.87 | 0.024658 | 0.018284 | -0.037843 | 0.087383 | 0.000017 | 0.000013 | -0.000026 | 0.000060 | 0.000017 | 0.000013 | -0.000026 | 0.000060 | 0.027558 | 0.013523 | 0.000000 | 0.087383 | 0.080556 | -0.000102 | 2 | 1 |
| 1 | 1 | 1.218994 | 0.090108 | 1.014138 | 1.485800 | 1.781323 | 1.203991 | 0.232625 | 4.459367 | 1.347750 | 2.666659 | 3.096905e-08 | 17.418821 | 0.731985 | 1.171627 | -1.213173 | 2.477276 | -1.189267 | 120.431399 | -392.28 | 554.77 | 35.796111 | 0.029522 | 35.75 | 35.87 | 0.020313 | 0.019242 | -0.037843 | 0.087383 | 0.000014 | 0.000013 | -0.000026 | 0.000060 | 0.000014 | 0.000013 | -0.000026 | 0.000060 | 0.023420 | 0.015310 | 0.000000 | 0.087383 | 0.144444 | -0.000424 | 2 | 1 |
| 2 | 2 | 1.143312 | 0.110987 | 0.948835 | 1.485800 | 1.173169 | 1.285422 | 0.006950 | 4.459367 | 0.752335 | 1.958546 | 3.096905e-08 | 17.418821 | 1.110242 | 1.112268 | -1.213173 | 2.037179 | 0.280427 | 87.571000 | -357.53 | 371.12 | 35.763056 | 0.044673 | 35.68 | 35.87 | 0.016618 | 0.015316 | -0.021330 | 0.071558 | 0.000011 | 0.000011 | -0.000015 | 0.000049 | 0.000011 | 0.000011 | -0.000015 | 0.000049 | 0.018759 | 0.012604 | 0.000000 | 0.071558 | 0.102778 | -0.000814 | 2 | 1 |
| 3 | 3 | 1.020669 | 0.135308 | 0.811090 | 1.239944 | 0.311656 | 0.278650 | 0.006950 | 1.303071 | 0.198576 | 0.413802 | 3.309990e-08 | 2.788862 | 1.598995 | 0.350355 | 0.959752 | 2.037179 | 0.055833 | 68.797466 | -345.19 | 359.57 | 35.725000 | 0.033491 | 35.66 | 35.81 | 0.022681 | 0.012560 | -0.006881 | 0.054356 | 0.000016 | 0.000009 | -0.000005 | 0.000037 | 0.000016 | 0.000009 | -0.000005 | 0.000037 | 0.022888 | 0.012180 | 0.000688 | 0.054356 | 0.108333 | -0.000524 | 2 | 1 |
| 4 | 4 | 0.887458 | 0.116048 | 0.727406 | 1.125306 | 0.163826 | 0.110277 | 0.006950 | 0.369298 | 0.118080 | 0.237575 | 2.787285e-08 | 1.300810 | 1.342085 | 0.405980 | 0.945946 | 2.037179 | 0.096681 | 43.606312 | -289.26 | 209.89 | 35.701333 | 0.022420 | 35.66 | 35.75 | 0.028105 | 0.010415 | 0.002752 | 0.054356 | 0.000019 | 0.000007 | 0.000002 | 0.000037 | 0.000019 | 0.000007 | 0.000002 | 0.000037 | 0.028105 | 0.010415 | 0.002752 | 0.054356 | 0.147222 | -0.000165 | 2 | 1 |
| 5 | 5 | 0.776920 | 0.071154 | 0.681346 | 0.956575 | 0.155098 | 0.115413 | 0.002306 | 0.369298 | 0.113253 | 0.233061 | 2.787285e-08 | 1.289171 | 1.015119 | 0.158530 | 0.817326 | 1.513996 | -0.642795 | 52.948702 | -289.26 | 209.89 | 35.705056 | 0.023058 | 35.66 | 35.75 | 0.034358 | 0.004849 | 0.002752 | 0.054356 | 0.000024 | 0.000003 | 0.000002 | 0.000037 | 0.000024 | 0.000003 | 0.000002 | 0.000037 | 0.034358 | 0.004849 | 0.002752 | 0.054356 | 0.138889 | 0.000261 | 2 | 1 |
| 6 | 6 | 0.705557 | 0.055554 | 0.608254 | 0.819336 | 0.080122 | 0.092646 | 0.002306 | 0.319375 | 0.048063 | 0.151028 | 2.787285e-08 | 1.105898 | 0.873283 | 0.105136 | 0.656496 | 1.013622 | -0.037437 | 41.045187 | -199.01 | 194.12 | 35.721444 | 0.028090 | 35.66 | 35.77 | 0.031188 | 0.004681 | 0.013761 | 0.039907 | 0.000021 | 0.000003 | 0.000009 | 0.000027 | 0.000021 | 0.000003 | 0.000009 | 0.000027 | 0.031188 | 0.004681 | 0.013761 | 0.039907 | 0.138889 | 0.000460 | 2 | 1 |
| 7 | 7 | 0.639991 | 0.054349 | 0.543110 | 0.725169 | 0.022266 | 0.034928 | 0.000015 | 0.132781 | 0.016674 | 0.090613 | 5.174644e-08 | 0.997037 | 0.732013 | 0.147837 | 0.460235 | 0.999065 | -0.083809 | 35.416182 | -197.37 | 194.12 | 35.753111 | 0.029950 | 35.71 | 35.81 | 0.029377 | 0.004256 | 0.013761 | 0.038531 | 0.000020 | 0.000003 | 0.000009 | 0.000027 | 0.000020 | 0.000003 | 0.000009 | 0.000027 | 0.029377 | 0.004256 | 0.013761 | 0.038531 | 0.152778 | 0.000516 | 2 | 1 |
| 8 | 8 | 0.580220 | 0.054845 | 0.486494 | 0.685270 | 0.024059 | 0.037475 | 0.000015 | 0.167825 | 0.025170 | 0.089431 | 3.297693e-08 | 0.601262 | 0.548576 | 0.180334 | 0.146098 | 0.816318 | 0.548538 | 57.092149 | -367.11 | 363.29 | 35.783667 | 0.033894 | 35.73 | 35.84 | 0.027603 | 0.007144 | -0.002752 | 0.066053 | 0.000019 | 0.000005 | -0.000002 | 0.000045 | 0.000019 | 0.000005 | -0.000002 | 0.000045 | 0.027618 | 0.007088 | 0.000000 | 0.066053 | 0.152778 | 0.000593 | 2 | 1 |
| 9 | 9 | 0.532770 | 0.036903 | 0.474375 | 0.607551 | 0.165363 | 0.216325 | 0.000015 | 0.669836 | 0.152681 | 0.475520 | 3.284132e-08 | 3.622407 | 0.263263 | 0.287734 | -0.202700 | 0.653034 | -0.310028 | 96.934155 | -670.20 | 363.29 | 35.814722 | 0.028076 | 35.75 | 35.87 | 0.028278 | 0.010877 | -0.030962 | 0.074998 | 0.000019 | 0.000007 | -0.000021 | 0.000052 | 0.000019 | 0.000007 | -0.000021 | 0.000052 | 0.028672 | 0.009792 | 0.000000 | 0.074998 | 0.122222 | 0.000447 | 2 | 1 |
We can observe that the all the data is numeric and there are no missing values. We will remove the first column as it is just a clone of the index.
data = data.drop([data.columns[0]], axis=1)
Modified Data Preview
| net_acc_mean | net_acc_std | net_acc_min | net_acc_max | EDA_phasic_mean | EDA_phasic_std | EDA_phasic_min | EDA_phasic_max | EDA_smna_mean | EDA_smna_std | EDA_smna_min | EDA_smna_max | EDA_tonic_mean | EDA_tonic_std | EDA_tonic_min | EDA_tonic_max | BVP_mean | BVP_std | BVP_min | BVP_max | TEMP_mean | TEMP_std | TEMP_min | TEMP_max | ACC_x_mean | ACC_x_std | ACC_x_min | ACC_x_max | ACC_y_mean | ACC_y_std | ACC_y_min | ACC_y_max | ACC_z_mean | ACC_z_std | ACC_z_min | ACC_z_max | 0_mean | 0_std | 0_min | 0_max | BVP_peak_freq | TEMP_slope | subject | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.331891 | 0.153556 | 1.014138 | 1.678399 | 2.247876 | 1.112076 | 0.367977 | 4.459367 | 1.592308 | 2.645333 | 3.096905e-08 | 17.418821 | 0.608263 | 1.212010 | -1.213173 | 2.554750 | -0.043934 | 112.391233 | -392.28 | 554.77 | 35.816000 | 0.017436 | 35.77 | 35.87 | 0.024658 | 0.018284 | -0.037843 | 0.087383 | 0.000017 | 0.000013 | -0.000026 | 0.000060 | 0.000017 | 0.000013 | -0.000026 | 0.000060 | 0.027558 | 0.013523 | 0.000000 | 0.087383 | 0.080556 | -0.000102 | 2 | 1 |
| 1 | 1.218994 | 0.090108 | 1.014138 | 1.485800 | 1.781323 | 1.203991 | 0.232625 | 4.459367 | 1.347750 | 2.666659 | 3.096905e-08 | 17.418821 | 0.731985 | 1.171627 | -1.213173 | 2.477276 | -1.189267 | 120.431399 | -392.28 | 554.77 | 35.796111 | 0.029522 | 35.75 | 35.87 | 0.020313 | 0.019242 | -0.037843 | 0.087383 | 0.000014 | 0.000013 | -0.000026 | 0.000060 | 0.000014 | 0.000013 | -0.000026 | 0.000060 | 0.023420 | 0.015310 | 0.000000 | 0.087383 | 0.144444 | -0.000424 | 2 | 1 |
| 2 | 1.143312 | 0.110987 | 0.948835 | 1.485800 | 1.173169 | 1.285422 | 0.006950 | 4.459367 | 0.752335 | 1.958546 | 3.096905e-08 | 17.418821 | 1.110242 | 1.112268 | -1.213173 | 2.037179 | 0.280427 | 87.571000 | -357.53 | 371.12 | 35.763056 | 0.044673 | 35.68 | 35.87 | 0.016618 | 0.015316 | -0.021330 | 0.071558 | 0.000011 | 0.000011 | -0.000015 | 0.000049 | 0.000011 | 0.000011 | -0.000015 | 0.000049 | 0.018759 | 0.012604 | 0.000000 | 0.071558 | 0.102778 | -0.000814 | 2 | 1 |
| 3 | 1.020669 | 0.135308 | 0.811090 | 1.239944 | 0.311656 | 0.278650 | 0.006950 | 1.303071 | 0.198576 | 0.413802 | 3.309990e-08 | 2.788862 | 1.598995 | 0.350355 | 0.959752 | 2.037179 | 0.055833 | 68.797466 | -345.19 | 359.57 | 35.725000 | 0.033491 | 35.66 | 35.81 | 0.022681 | 0.012560 | -0.006881 | 0.054356 | 0.000016 | 0.000009 | -0.000005 | 0.000037 | 0.000016 | 0.000009 | -0.000005 | 0.000037 | 0.022888 | 0.012180 | 0.000688 | 0.054356 | 0.108333 | -0.000524 | 2 | 1 |
| 4 | 0.887458 | 0.116048 | 0.727406 | 1.125306 | 0.163826 | 0.110277 | 0.006950 | 0.369298 | 0.118080 | 0.237575 | 2.787285e-08 | 1.300810 | 1.342085 | 0.405980 | 0.945946 | 2.037179 | 0.096681 | 43.606312 | -289.26 | 209.89 | 35.701333 | 0.022420 | 35.66 | 35.75 | 0.028105 | 0.010415 | 0.002752 | 0.054356 | 0.000019 | 0.000007 | 0.000002 | 0.000037 | 0.000019 | 0.000007 | 0.000002 | 0.000037 | 0.028105 | 0.010415 | 0.002752 | 0.054356 | 0.147222 | -0.000165 | 2 | 1 |
| 5 | 0.776920 | 0.071154 | 0.681346 | 0.956575 | 0.155098 | 0.115413 | 0.002306 | 0.369298 | 0.113253 | 0.233061 | 2.787285e-08 | 1.289171 | 1.015119 | 0.158530 | 0.817326 | 1.513996 | -0.642795 | 52.948702 | -289.26 | 209.89 | 35.705056 | 0.023058 | 35.66 | 35.75 | 0.034358 | 0.004849 | 0.002752 | 0.054356 | 0.000024 | 0.000003 | 0.000002 | 0.000037 | 0.000024 | 0.000003 | 0.000002 | 0.000037 | 0.034358 | 0.004849 | 0.002752 | 0.054356 | 0.138889 | 0.000261 | 2 | 1 |
| 6 | 0.705557 | 0.055554 | 0.608254 | 0.819336 | 0.080122 | 0.092646 | 0.002306 | 0.319375 | 0.048063 | 0.151028 | 2.787285e-08 | 1.105898 | 0.873283 | 0.105136 | 0.656496 | 1.013622 | -0.037437 | 41.045187 | -199.01 | 194.12 | 35.721444 | 0.028090 | 35.66 | 35.77 | 0.031188 | 0.004681 | 0.013761 | 0.039907 | 0.000021 | 0.000003 | 0.000009 | 0.000027 | 0.000021 | 0.000003 | 0.000009 | 0.000027 | 0.031188 | 0.004681 | 0.013761 | 0.039907 | 0.138889 | 0.000460 | 2 | 1 |
| 7 | 0.639991 | 0.054349 | 0.543110 | 0.725169 | 0.022266 | 0.034928 | 0.000015 | 0.132781 | 0.016674 | 0.090613 | 5.174644e-08 | 0.997037 | 0.732013 | 0.147837 | 0.460235 | 0.999065 | -0.083809 | 35.416182 | -197.37 | 194.12 | 35.753111 | 0.029950 | 35.71 | 35.81 | 0.029377 | 0.004256 | 0.013761 | 0.038531 | 0.000020 | 0.000003 | 0.000009 | 0.000027 | 0.000020 | 0.000003 | 0.000009 | 0.000027 | 0.029377 | 0.004256 | 0.013761 | 0.038531 | 0.152778 | 0.000516 | 2 | 1 |
| 8 | 0.580220 | 0.054845 | 0.486494 | 0.685270 | 0.024059 | 0.037475 | 0.000015 | 0.167825 | 0.025170 | 0.089431 | 3.297693e-08 | 0.601262 | 0.548576 | 0.180334 | 0.146098 | 0.816318 | 0.548538 | 57.092149 | -367.11 | 363.29 | 35.783667 | 0.033894 | 35.73 | 35.84 | 0.027603 | 0.007144 | -0.002752 | 0.066053 | 0.000019 | 0.000005 | -0.000002 | 0.000045 | 0.000019 | 0.000005 | -0.000002 | 0.000045 | 0.027618 | 0.007088 | 0.000000 | 0.066053 | 0.152778 | 0.000593 | 2 | 1 |
| 9 | 0.532770 | 0.036903 | 0.474375 | 0.607551 | 0.165363 | 0.216325 | 0.000015 | 0.669836 | 0.152681 | 0.475520 | 3.284132e-08 | 3.622407 | 0.263263 | 0.287734 | -0.202700 | 0.653034 | -0.310028 | 96.934155 | -670.20 | 363.29 | 35.814722 | 0.028076 | 35.75 | 35.87 | 0.028278 | 0.010877 | -0.030962 | 0.074998 | 0.000019 | 0.000007 | -0.000021 | 0.000052 | 0.000019 | 0.000007 | -0.000021 | 0.000052 | 0.028672 | 0.009792 | 0.000000 | 0.074998 | 0.122222 | 0.000447 | 2 | 1 |
Dataset Column Overview¶
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2091 entries, 0 to 2090 Data columns (total 44 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 net_acc_mean 2091 non-null float64 1 net_acc_std 2091 non-null float64 2 net_acc_min 2091 non-null float64 3 net_acc_max 2091 non-null float64 4 EDA_phasic_mean 2091 non-null float64 5 EDA_phasic_std 2091 non-null float64 6 EDA_phasic_min 2091 non-null float64 7 EDA_phasic_max 2091 non-null float64 8 EDA_smna_mean 2091 non-null float64 9 EDA_smna_std 2091 non-null float64 10 EDA_smna_min 2091 non-null float64 11 EDA_smna_max 2091 non-null float64 12 EDA_tonic_mean 2091 non-null float64 13 EDA_tonic_std 2091 non-null float64 14 EDA_tonic_min 2091 non-null float64 15 EDA_tonic_max 2091 non-null float64 16 BVP_mean 2091 non-null float64 17 BVP_std 2091 non-null float64 18 BVP_min 2091 non-null float64 19 BVP_max 2091 non-null float64 20 TEMP_mean 2091 non-null float64 21 TEMP_std 2091 non-null float64 22 TEMP_min 2091 non-null float64 23 TEMP_max 2091 non-null float64 24 ACC_x_mean 2091 non-null float64 25 ACC_x_std 2091 non-null float64 26 ACC_x_min 2091 non-null float64 27 ACC_x_max 2091 non-null float64 28 ACC_y_mean 2091 non-null float64 29 ACC_y_std 2091 non-null float64 30 ACC_y_min 2091 non-null float64 31 ACC_y_max 2091 non-null float64 32 ACC_z_mean 2091 non-null float64 33 ACC_z_std 2091 non-null float64 34 ACC_z_min 2091 non-null float64 35 ACC_z_max 2091 non-null float64 36 0_mean 2091 non-null float64 37 0_std 2091 non-null float64 38 0_min 2091 non-null float64 39 0_max 2091 non-null float64 40 BVP_peak_freq 2091 non-null float64 41 TEMP_slope 2091 non-null float64 42 subject 2091 non-null int64 43 label 2091 non-null int64 dtypes: float64(42), int64(2) memory usage: 718.9 KB
Dataset Shape¶
Descriptive Statistics¶
Now we will explore the data. We will start by looking at the distribution of the features.
| net_acc_mean | net_acc_std | net_acc_min | net_acc_max | EDA_phasic_mean | EDA_phasic_std | EDA_phasic_min | EDA_phasic_max | EDA_smna_mean | EDA_smna_std | EDA_smna_min | EDA_smna_max | EDA_tonic_mean | EDA_tonic_std | EDA_tonic_min | EDA_tonic_max | BVP_mean | BVP_std | BVP_min | BVP_max | TEMP_mean | TEMP_std | TEMP_min | TEMP_max | ACC_x_mean | ACC_x_std | ACC_x_min | ACC_x_max | ACC_y_mean | ACC_y_std | ACC_y_min | ACC_y_max | ACC_z_mean | ACC_z_std | ACC_z_min | ACC_z_max | 0_mean | 0_std | 0_min | 0_max | BVP_peak_freq | TEMP_slope | subject | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2.091000e+03 | 2091.000000 | 2091.000000 | 2091.000000 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2091.000000 | 2.091000e+03 | 2.091000e+03 | 2.091000e+03 | 2091.000000 | 2.091000e+03 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 | 2091.000000 |
| mean | 1.966550 | 0.053456 | 1.855260 | 2.089283 | 1.700688e-01 | 1.118879e-01 | 3.403359e-02 | 4.413895e-01 | 1.300774e-01 | 2.672041e-01 | 6.123098e-08 | 1.826972e+00 | -0.041869 | 0.113459 | -0.255890 | 0.108671 | -0.000153 | 49.919955 | -270.200014 | 240.117805 | 33.018114 | 0.026576 | 32.967303 | 33.068604 | 0.011463 | 4.462103e-03 | -0.007762 | 0.030142 | 0.000008 | 3.070164e-06 | -5.340943e-06 | 2.073911e-05 | 0.000008 | 3.070164e-06 | -5.340943e-06 | 2.073911e-05 | 0.029561 | 4.187458e-03 | 0.014872 | 0.048907 | 0.125485 | -0.000015 | 9.404113 | 1.140603 |
| std | 2.657218 | 0.091732 | 2.496197 | 2.789401 | 5.482836e-01 | 4.572540e-01 | 1.063828e-01 | 1.489958e+00 | 4.227190e-01 | 7.230139e-01 | 4.659428e-08 | 5.546362e+00 | 1.226109 | 0.438957 | 1.669225 | 1.201522 | 0.573221 | 40.131618 | 238.855851 | 209.107222 | 1.470879 | 0.020533 | 1.467757 | 1.475477 | 0.028678 | 4.229046e-03 | 0.034671 | 0.032306 | 0.000020 | 2.909809e-06 | 2.385519e-05 | 2.222845e-05 | 0.000020 | 2.909809e-06 | 2.385519e-05 | 2.222845e-05 | 0.009287 | 3.689155e-03 | 0.013045 | 0.018166 | 0.039913 | 0.000565 | 4.706482 | 0.661542 |
| min | 0.091182 | 0.000742 | 0.074363 | 0.100672 | 1.135074e-07 | 1.525014e-08 | 6.445254e-08 | 1.709373e-07 | 8.388991e-08 | 1.827182e-08 | 3.479847e-09 | 1.532876e-07 | -10.033692 | 0.000257 | -25.222599 | -2.216655 | -5.428135 | 2.834831 | -1617.860000 | 7.270000 | 29.381111 | 0.007700 | 29.330000 | 29.430000 | -0.044579 | 5.759282e-16 | -0.088071 | -0.040595 | -0.000031 | 6.403569e-19 | -6.059740e-05 | -2.793162e-05 | -0.000031 | 6.403569e-19 | -6.059740e-05 | -2.793162e-05 | 0.000555 | 8.326673e-16 | 0.000000 | 0.004128 | 0.025000 | -0.003220 | 2.000000 | 0.000000 |
| 25% | 0.307707 | 0.004405 | 0.292997 | 0.321858 | 4.294791e-03 | 5.038863e-03 | 8.542278e-06 | 2.139904e-02 | 3.276647e-03 | 1.583230e-02 | 2.948547e-08 | 1.359000e-01 | -0.795608 | 0.010061 | -0.889842 | -0.753547 | -0.148356 | 23.048587 | -352.120000 | 94.575000 | 32.274222 | 0.014900 | 32.230000 | 32.310000 | -0.020900 | 9.221599e-04 | -0.035779 | 0.000688 | -0.000014 | 6.344950e-07 | -2.461769e-05 | 4.734172e-07 | -0.000014 | 6.344950e-07 | -2.461769e-05 | 4.734172e-07 | 0.023266 | 9.206196e-04 | 0.002064 | 0.037843 | 0.097222 | -0.000290 | 5.000000 | 1.000000 |
| 50% | 0.846770 | 0.017041 | 0.772491 | 0.938149 | 2.018820e-02 | 2.109368e-02 | 3.296700e-04 | 8.417205e-02 | 1.534513e-02 | 5.711463e-02 | 4.831052e-08 | 5.035151e-01 | -0.498252 | 0.034663 | -0.525585 | -0.440061 | -0.003823 | 38.391360 | -197.370000 | 189.240000 | 33.166889 | 0.019317 | 33.130000 | 33.230000 | 0.024675 | 3.285789e-03 | 0.000688 | 0.040595 | 0.000017 | 2.260798e-06 | 4.734172e-07 | 2.793162e-05 | 0.000017 | 2.260798e-06 | 4.734172e-07 | 2.793162e-05 | 0.030536 | 3.228065e-03 | 0.012385 | 0.048164 | 0.127778 | -0.000053 | 9.000000 | 1.000000 |
| 75% | 2.665476 | 0.063217 | 2.516406 | 2.884861 | 1.618663e-01 | 9.929476e-02 | 1.428761e-02 | 4.578447e-01 | 1.231743e-01 | 2.893434e-01 | 7.709707e-08 | 1.908303e+00 | 0.735160 | 0.096417 | 0.565363 | 0.989545 | 0.147601 | 64.203423 | -100.530000 | 323.660000 | 34.011000 | 0.029580 | 33.950000 | 34.070000 | 0.036200 | 6.795670e-03 | 0.021330 | 0.051948 | 0.000025 | 4.675783e-06 | 1.467593e-05 | 3.574300e-05 | 0.000025 | 4.675783e-06 | 1.467593e-05 | 3.574300e-05 | 0.037319 | 6.645341e-03 | 0.025458 | 0.057796 | 0.150000 | 0.000171 | 14.000000 | 2.000000 |
| max | 15.632220 | 1.130964 | 14.720361 | 15.931444 | 1.197433e+01 | 1.044126e+01 | 1.838081e+00 | 2.963154e+01 | 9.223967e+00 | 1.419266e+01 | 2.929251e-07 | 1.172344e+02 | 3.028557 | 9.991237 | 2.890934 | 3.291220 | 4.628719 | 320.678627 | -9.280000 | 1789.000000 | 35.933111 | 0.193635 | 35.910000 | 35.970000 | 0.043367 | 2.607680e-02 | 0.043347 | 0.087383 | 0.000030 | 1.794223e-05 | 2.982528e-05 | 6.012398e-05 | 0.000030 | 1.794223e-05 | 2.982528e-05 | 6.012398e-05 | 0.044579 | 1.874508e-02 | 0.043347 | 0.088071 | 0.319444 | 0.003682 | 17.000000 | 2.000000 |
Feature selection¶
After loading and observing the dataset, it's time to find best features. Before doing this, we look at all the features available in the dataset.
data_ = data.copy()
cdf = pd.concat([data_.drop("label", axis=1), pd.get_dummies(data_["label"])], axis=1)
cdf.rename(columns={0: "amusement", 1: "baseline", 2: "stress"}, inplace=True)
corr = cdf.corr()
fig = px.imshow(corr[["amusement", "baseline", "stress"]], text_auto=True, color_continuous_scale=px.colors.sequential.Viridis)
fig = fig.update_layout(width=500, height=3800)
fig.show()
After looking at all features, we use SelectKBest from the sklearn library to find five best features.
# def get_best_features()
kBest = fs.SelectKBest(fs.f_classif, k=5)
res = kBest.fit_transform(data.drop(columns=['label']), data['label'])
filter = kBest.get_support()
df = pd.DataFrame(res, columns = data.columns[:-1][filter])
df = df.join(data['label'])
print(f"""Top features:\n{" ".join(data.columns[:-1][filter])}""")
Top features: net_acc_std net_acc_max EDA_tonic_mean EDA_tonic_min EDA_tonic_max
cdf = pd.concat([df.drop("label", axis=1), pd.get_dummies(df["label"])], axis=1)
cdf.rename(columns={0: "amusement", 1: "baseline", 2: "stress"}, inplace=True)
corr = cdf.corr()
fig = px.imshow(corr[["amusement", "baseline", "stress"]], text_auto=True, color_continuous_scale=px.colors.sequential.Viridis)
fig = fig.update_layout(width=500, height=800)
fig.show()
Data analysis by subject¶
In this section we plot top five features per subject to see if they will show the same pattern.
from plotly.subplots import make_subplots
import plotly.graph_objects as go
def plot_distribution(feature, nbr_cols=4):
subjects = [2, 3, 4, 5, 6 ,7 ,8, 9, 10, 11, 13, 14, 15, 16, 17]
titles = [f'Subject {x}' for x in subjects]
plot = make_subplots(rows=len(subjects) // nbr_cols+1, cols=nbr_cols, subplot_titles=titles)
row_n = 1
col_n = 1
for sub in subjects:
csv = pd.read_csv(f'../data/03_primary/WESAD/subject_feats/S{sub}_feats_4.csv')
plot.add_trace(go.Bar(y=csv[feature]), row_n, col_n, )
col_n += 1
if col_n > nbr_cols:
col_n = 1
row_n += 1
plot.update_layout(height=1000)
plot.show()
net_acc_std¶
plot_distribution('net_acc_std')
net_acc_max¶
plot_distribution('net_acc_max')
EDA_tonic_mean¶
plot_distribution('EDA_tonic_mean')
EDA_tonic_min¶
plot_distribution('EDA_tonic_min')
EDA_tonic_max¶
plot_distribution('EDA_tonic_max')
The effect of the stress level¶
Finally, we will compare all three states (baseline, stress and amusement) to see how the values differentiate between the states.
First, we prepare the combined dataset.
def get_label(label):
frame = df.loc[df.label==label]
frame.index = range(0, frame.shape[0])
frame.index = pd.to_datetime(frame.index, unit='s')
return frame
amusement = get_label(0)
baseline = get_label(1)
stress = get_label(2)
df.label = df.label.replace(0, 'amusement')
df.label = df.label.replace(1, 'baseline')
df.label = df.label.replace(2, 'stress')
Next, we define functions to plot data per state for each subject.
def split_states(data_, label_):
print(data_.head())
frame = data_.loc[data_.label==label_]
frame.index = range(0, frame.shape[0])
frame.index = pd.to_datetime(frame.index, unit='s')
return frame
def plot_states(feature, nbr_cols=3):
subjects = [2, 3, 4, 5, 6 ,7 ,8, 9, 10, 11, 13, 14, 15, 16, 17]
titles = [f'Subject {x}' for x in subjects]
plot = make_subplots(rows=5, cols=nbr_cols, subplot_titles=titles)
row_n = 1
col_n = 1
for sub in subjects:
csv = pd.read_csv(f'../data/03_primary/WESAD/subject_feats/S{sub}_feats_4.csv')
csv['label'] = csv[['0','1','2']].idxmax(axis=1).astype('int32')
plot.add_trace(go.Scatter(y=csv.loc[csv.label == 0][feature], line = {'color': '#00cc96'}, mode='lines', name = 'amusement'), row=row_n, col=col_n)
plot.add_trace(go.Scatter(y=csv.loc[csv.label == 1][feature], line = {'color': '#ef553b'}, mode='lines', name = 'baseline'), row=row_n, col=col_n)
plot.add_trace(go.Scatter(y=csv.loc[csv.label == 2][feature], line = {'color': '#19d3f3'}, mode='lines', name = 'stress'), row=row_n, col=col_n)
col_n += 1
if col_n > nbr_cols:
col_n = 1
row_n += 1
tmp = 0
for trace in plot['data']:
tmp += 1
if tmp > 3:
trace['showlegend'] = False
plot.update_layout(height=1500, title_text="Per Subject Data")
plot.show()
net_acc_std¶
fig = px.line(df, y='net_acc_std', color='label', title="Combined Data")
fig.show()
After looking at the combined graph, let's see how it looks separately for each subject.
plot_states('net_acc_std')
net_acc_max¶
fig = px.line(df, y='net_acc_max', color='label', title="Combined Data")
fig.show()
After looking at the combined graph, let's see how it looks separately for each subject.
plot_states('net_acc_max')
EDA_tonic_mean¶
fig = px.line(df, y='EDA_tonic_mean', color='label', title="Combined Data")
fig.show()
After looking at the combined graph, let's see how it looks separately for each subject.
plot_states('EDA_tonic_mean')
EDA_tonic_min¶
fig = px.line(df, y='EDA_tonic_min', color='label', title="Combined Data")
fig.show()
After looking at the combined graph, let's see how it looks separately for each subject.
plot_states('EDA_tonic_min')
EDA_tonic_max¶
fig = px.line(df, y='EDA_tonic_max', color='label', title="Combined Data")
fig.show()
After looking at the combined graph, let's see how it looks separately for each subject.
plot_states('EDA_tonic_max')
Conclusion¶
After applying the sliding window to the dataset, we can conclude that the characteristics of the data did not change that much. Best correlating features are the same as in the initial 30s dataset and graphs of all best features show the same patterns.
From observing the graphs above we can conclude that for the EDA_tonic features for almost all the subjects, except for subject 17, have a noticeably higher and fluctuating values when they are stressed. Subject 17 anomaly could be explained by the fact that "the sensor was not fully attached throughout the entire duration of the study protocol", as it is stated in the dataset files.
In the case of the net_acc_std feature we can see that there are way more in the readings of stress patients and the values are generally higher, the difference in the values of stressed and baseline periods can be observed more clearly in the net_acc_max feature graphs as there aren't any big oscillations.
We can conclude that even a basic Machine Learning model, or a thresholding algorithm will be able to classify the stress levels with a high accuracy, because of the clear difference between the features of the different stress levels.